Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Previously when trying to run with 'rdma_mpi' equal to true, I would get illegal buffer pointer values in the halo communication routines. After a small change that makes sense to me, CUDA-aware MPI now seems to work on both Delta and Phoenix.
Type of change
Please delete options that are not relevant.
Scope
If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration
I built and ran 2 cases that differed only by the 'rdma_mpi' status on both Phoenix and Delta. On Delta, it was 3D_TGV and on Phoenix it was 2D_riemman test. h5diff on the resulting hd5f files of the last step showed no difference on Phoenix. Additionally, I took nsys reports on phoenix to show the lack of H2D/D2H copies.
First without RDMA_MPI, the code moves the GPU data to the CPU before MPI exchange:

With RDMA_MPI enabled, there are no data copies. This led to a marginal improvement in the RHS-MPI time for this case:

If your code changes any code source files (anything in
src/simulation)To make sure the code is performing as expected on GPU devices, I have:
nvtxranges so that they can be identified in profiles./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR./mfc.sh run XXXX --gpu -t simulation --omniperf, and have attached the output file and plain text results to this PR.